fix: add node health check to incr/decr before mutate operation by AustinWheel · Pull Request #186 · Netflix/EVCache

AustinWheel · 2026-03-03T17:38:57Z

incr/decr operations bypassed the ensureWriteQueueSize check that other
write paths (set, add, touch) use, causing them to block on unreachable
or stalled nodes for the full mutateOperationTimeout duration. A single
unhealthy node could cause all incr/decr calls to block, exhausting the
caller's thread pool.

This adds the same ensureWriteQueueSize gate to incr/decr. If the target
node is inactive or its write queue is full, the operation returns -1
immediately, consistent with the documented API contract. The existing
reconciliation logic in EVCacheImpl.incr()/decr() already handles -1 as
a zone failure and will sync the value from healthy zones on the next
successful operation.

incr/decr operations bypassed the ensureWriteQueueSize check that other write paths (set, add, touch) use, causing them to block on unreachable or stalled nodes for the full mutateOperationTimeout duration. A single unhealthy node could cause all incr/decr calls to block, exhausting the caller's thread pool. This adds the same ensureWriteQueueSize gate to incr/decr. If the target node is inactive or its write queue is full, the operation returns -1 immediately, consistent with the documented API contract. The existing reconciliation logic in EVCacheImpl.incr()/decr() already handles -1 as a zone failure and will sync the value from healthy zones on the next successful operation.

evcache-core/src/main/java/com/netflix/evcache/pool/EVCacheClient.java

… write path - EVCacheImpl.decr: fix reconciliation to pick the minimum non-(-1) value across nodes instead of the maximum. For decr, the most up-to-date node has the lowest value (most decremented), so the old max-pick logic would overwrite correctly decremented nodes with stale higher values. - EVCacheClient.ensureWriteQueueSize: add isAvailable() check before entering the retry/sleep loop. This fast-fails writes to inactive nodes instead of blocking request threads through 3 retry iterations. Affects all write operations that go through ensureWriteQueueSize (incr, decr, set, delete, etc.).

Sunjeet · 2026-03-10T23:28:36Z

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

Reviewed the following changes:

Node health check (ensureWriteQueueSize) added to incr/decr in EVCacheClient.java -- consistent with existing pattern used by all other write operations
Inactive node early-exit added to ensureWriteQueueSize -- correctly avoids retry loop for already-disconnected nodes
Decrement reconciliation fix in EVCacheImpl.java -- correctly selects minimum (most-decremented) value across zones instead of maximum

🤖 Generated with Claude Code

_{- If this code review was useful, please react with 👍. Otherwise, react with 👎.}

AustinWheel requested review from Sunjeet, akashdeepgoel and shy-1234 March 3, 2026 17:38

shy-1234 reviewed Mar 5, 2026

View reviewed changes

evcache-core/src/main/java/com/netflix/evcache/pool/EVCacheClient.java Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: add node health check to incr/decr before mutate operation#186

fix: add node health check to incr/decr before mutate operation#186
AustinWheel wants to merge 2 commits intomasterfrom
awheeler/fix/incr-decr-tolerance

AustinWheel commented Mar 3, 2026

Uh oh!

Uh oh!

Sunjeet commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

AustinWheel commented Mar 3, 2026

Uh oh!

Uh oh!

Sunjeet commented Mar 10, 2026

Code review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants